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Abstract. We study the problem of parameter estimation for a univariate 
discretely observed ergodic diffusion process given as a solution to a stochastic 
differential equation. The estimation procedure we propose consists of two 
steps. In the first step, which is referred to as a smoothing step, we smooth 
the data and construct a nonparametric estimator of the invariant density of 
the process. In the second step, which is referred to as a matching step, we 
exploit a characterisation of the invariant density as a solution of a certain 
ordinary differential equation, replace the invariant density in this equation 
by its nonparametric estimator from the smoothing step in order to arrive at 
an intuitively appealing criterion function, and next define our estimator of 
the parameter of interest as a minimiser of this criterion function. Our main 
results show that under suitable conditions our estimator is y'n-consistent, and 
even asymptotically normal. We also discuss a way of improving its asymptotic 
performance through a one-step Newton-Raphson type procedure and present 
results of a small scale simulation study. 



1. Introduction 



Stochastic differential equations play an important role in modelling various phe- 
nomena arising in fields as diverse a s finance, physics, chemistry, engineering, biol- 

ogy, n euros cience and others, s ee e.g . Alien! ( 20071 ) , iHindriksl ( 201 lh , Musiela and Rutkowskil 



(|2005l ) and I Wong and Haiekl ()1985l ). These equations usually depend on parame 



ters, which are often unknown. On the other hand knowledge of these parameters is 
critical for the study of the process at hand and hence their estimation based on the 
observational data on the process under study is of great importance in practical ap- 
plications. A formal setup that we consider in this paper is as follows: let ($7,5", P) 
be a probability space. Consider a Brownian motion W = (Wt)t>o and a random 
variable £ independent of W that are defined on (f2,5", P) and let $ = ($i)t>Q be 
the augmented filtration generated by £ and W. Consider a stochastic differential 
equation driven by W, 

idX t = n{X t ;6)dt + a(X t ;0)dW t , 



(1) 



where 9 € C K is an unknown parameter and Xq = £ defines the initial condition. 
Assume that there exists a unique strong solution to (JTJ on (fi, 5", P) with respect 
to the Brownian motion W and initial condition £. Let 9q denote the true param- 
eter value. Furthermore, let X be ergodic with invariant density 7r(-;#o) and let 
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£ ~ w(-;9o). The solution X is thus a strictly stationary process. Given a discrete 
time sample Xq, Xa, X2A, ■ ■ • , X n A from the process X, our goal is to estimate 
the parameter 9q. Hence here we consider a parametric inference problem for a 
stochastic differential equation. There is also a rich body of li terature on n o npara - 



metric inference fo r sto chastic differe ntial equations, see e.g. IComte et al.l (|2007l) . 
Gobet et al. (|2004f ) and I.Tacodl (j2000h and references therein. A general reference 



on statistical inference for ergodic diffusion processes is iKutovantsI ( 2004t ) . 

A natural approach to estimation of #0 is the maximum likelihood method. As- 
sume that the transition density p(A, x, y; 8) of X exists. Then the likelihood 
function associated with observations Xq, Xa, • ■ • , X n A can be written as 



p(A , Xa, X 2 a, ■ ■ ■ , X n 



7r(X ;8)l[p(A,X jA ,X u+1)A ;9), 

j=o 



and the maximum likelihood estimator can be computed by maximising the right- 
hand side of this expression over 9, provided both the invariant density and the 
transition density are known explicitly. Unfortunately, for many realistic and prac- 
tically useful models transition densities are not available in explicit form, which 
makes exact computation of the maximum likelihood estimator impossible. In those 
cases when the likelihood cannot be evaluated analytically, a number of alternative 
estimators have been proposed in the literature, which try to emulate the maximum 
likelihood method and rely upon some approximation of the likelihood, whence their 
name, the approximate maximum likeliho od estima t ors, d erives. For an overview 



and relevant references see e.g. Section 5 in S0renser] ( 20041 ). Although successful in 
a significant number of examples, these methods typically suffer fr om a considerabl e 
computational burden, see a brief discussion on pp. 350-351 in ISorensenl (J200J). 
We also remark that in general in statistical problems if the likelihood is a nonlin- 
ear function of the parameter of interest, comput ation of the ma ximum likelihood 
estimator is often far from straightforward, see e.g. lBarnett (Il966h . Returning then 
to diffusion processes, even if the transition densities are explicitly known, they 
still might be highly nonlinear functions of the parameter 9, which might render 
maximisation of the log-likelihood a difficult task. This is i n particluar true for the 
Cox- I ngersoll- Ross (CIR) process (see e.g. pp. 356-358 in iMusiela and Rutkowski 
(2005) for more information on the CIR process), where the transition densities are 
noncentral chi-square densities, already numerical evaluation of w hich, saying not h- 
ing about the optimisation process itself, is a nontrivial task, see iDyrtina (|2004l) . 

A popular alternative to approximate maximum likelihood methods is furnished 
by Z-estimators, which are defined as zeroes in 9 of estimating equations 

F n (Xo, Xa, • • • , A„a; 9) = 

for some giv en functions F n . For a general introduction to Z-estimators see e.g. 
Chapter 5 in Ivan der Vaart (2000). Z-estimators are often faster to compute than 
approximate maximum likelihood estimators, but the question of the choice of the 
estimating equations is a subtle one with no readily available recipes in many cases. 
For instance, the existing methods at times yield choices of F n that might give rise 
to nu merical problems or that are infeasible in practice, see remarks on pp. 343- 
344 in ISorensenl (|2004l) . For additional information on this approach to parameter 
estima t ion for diffusio n processes and r eferences see [B ibbv et al. (2010), iJacobsenl 
(|200lh , iKesslerl (|2000l) and Section 4 in lSorensenl (|2004) . 
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In the present work we study an approach alternative to the ones surveyed above. 
In particular, we will use a characterisation of the invariant density ir(-; •) of ([1]) as 
a solution of the ordinary differential equation (here a prime denotes a derivative 
with respect to x) 

(2) »(x; 9)ir(x; 6) - \ [a 2 (x; 9)ir(x; 0)] ' = 0, 

to motivate an estimator 9 n of 9q defined as 

(3) n = argmin ee0 R n {9), 
where 

(4) R n (9) = j L(x; 9)tt(x) - i [a 2 {x; 9)tt{x)] '\ w(x)dx. 

Here w(-) is a weight function chosen beforehand and n(-) is a nonparametric esti- 
mator of 7t(-;0q). In particular, in the latter capacity we will use a kernel density 
estimator. The intuition for 9 n is that for 7r(-) that is 'close' to tt(-;9q), in view of 
(0 the same must be true for 9 n and 9q. 

The estimator 9 n will be called a smooth and match estimator. Its name re- 
flects the fact that it is obtained through a two-step procedure: in the first step, 
which is referred to as a smoothing step, the data Zq, Zy, . . . , Z n are smoothed in 
order to obtain a nonparametric estimator %{■) of the stationary density tt(-;9q). 
In the second step, which is referred to as a matching step, a characterisation of 
7r(-;#o) a s a solution of ([2]) is used and an estimator of 9q is obtained in such 
a way that the left-hand side of ((2|) with n(-;9o) replaced by 7r(-) approximately 
matches zero. The construction of the estimator n is motivated by a similar con- 
str uction used in parameter estimat ion problems for ordinary differential equations, 



see 



Gugushvili and Klaassenl (|2012n for additional information and references. Ap- 



proaches to parameter estimation for stochastic differential equations that are close 
in spirit to the one considered in the present work, in that they rely on matching 
a parametric function to its nonpa r ametric est i mator , are studied in lAi't-Sahalis 

(20( 



(|l996bl ). iBandi and Phillips! (|2007t) . iKristensenl (|20ld) and ISorensenl (|2002h . We 



remark that our approach differs from the approaches in these papers either by the 
type of asymptotics or by the criterion function. 

The estimator 9 n is especially straightforward to compute when the drift coeffi- 
cient •) is linear in the components of the parameter 9, see Remark 1131 below. 
Obviously, ease of computation cannot be a sole justification for the use of any 
particular estimator and hence in order to provide more motivation for the use of 
our estimator in the present work we will study its asymptotic properties. Since the 
estimator 9 n is ultimately motivated by a characterisation of the marginal density 
of X, in the most general setting when both the drift and the dispersion coefficients 
in (TTJ) depend on the parameter 0, the full parameter vector 9 will typically be 
impossible to estimate due to idcntifiability problems. We hence have to specialise 
to some particular case, and we do this for the case when the dispersion coefficient 
<j(-]9) does not depend on and is a known function cr(-). Thus the stochastic 
differential equation underlying our model is 



(5) 



dX t =fi(X t ;0)dt + a(X t )dWt, 
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The structure of the paper is as follows: in Section [2] for the reader's conve- 
nience we list together the assumptions on our model. Detailed remarks on these 
assumptions are given in Section [3] When reading the paper, a reader can either 
browse through Section [2] or refer to it as need arises in the subsequent sections. 
A reader who finds the assumptions in Section [2] believable can skip Section [3] at 
first reading. In Sections |4] and [5] we state the main results of the paper, namely 
-^/n-consistency and asymptotic normality of 9 n . In Section |6] we discuss a further 
asymptotic improvement of the estimator 9 n through a Newton-Raphson type pro- 
cedure. Results of a small simulation study are presented in Section [7] Section [8] 
contains proofs of the results from Sections 2] and [5] Finally, Appendix [A] contains 
several technical lemmas used in the proofs of the results from Sections [4] and [5] 

We remark that in the present work we do not strive for maximal generality. 
Rather, our goal is to explore asymptotic properties of an intuitively appealing 
estimator of 9q, and to show that this estimator leads to reasonable results in a 
number of examples. 

Throughout the paper we use the following notation for derivatives: a dot denotes 
a derivative of an arbitrary function q(x; 9) with respect to 9, while a prime denotes 
its derivative with respect to x. We also define the strong mixing coefficient a&(k) 
as 

a A (k) = sup sup \F(AB) -¥(A)F(B)\, 

m>0 AeS< m ,BeS> m +k 

where $< m — o~(Zj,j < m) and 5>m = °~(Zjij > m ) f° r m € N U {0}. Here 
Zj = Xj/\ for j £ NU {0}. We call the sequence Zj a- mixing (or strongly mixing), 
if aiA(fc) -> as fc -> oo. When comparing two sequences {a n } and {b n } of real 
numbers, we will use the notation a n < b n to denote the fact that 3C > 0, such 
that for Vn £ N the inequality a n < Cb n holds. A similar convention will be used 
for 0"n <i b n . The notation a n x b n will denote the fact that the sequences {a n } and 
{b n } are asymptotically of the same order. 

2. Assumptions 

In this section we list the assumptions under which the theoretical results of the 
paper are proved. 

Assumption 1. The parameter space is a compact subset o/R: = [a, b] for 
a <b. 

Assumption 2. The drift coefficient n{-]9) is known up to the parameter 9 and the 
dispersion coefficient a(-) is a known function. Furthermore, there exists a unique 
strong solution X = (X t )t>o to ([5]) on P) with respect to the Brownian motion 

W and initial condition £. It is a homogeneous Markov process with transition 
density p(t, x, y; 9). Moreover, this solution is ergodic with bounded invariant density 
7r(-; •) that has a bounded, continuous and integrable derivative 7r'(-; •), and for £ ~ 
7r(-;(9) the solution X is a strictly stationary process. Also, 7r(-, •) exists. Finally, 
for all 9 £ it holds that the support o/?r(-; •), i.e. the state space of X, equals R. 

Assumption 3. A sample Xq, Aa, • • • (here A > is fixed) from X corresponding 
to the true parameter value 9q is a-mixing with strong mixing coefficients a&{k) 
satisfying the condition J2T=o aA ^ < 00 • 
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Assumption 4. The stationary density 7r(-;-) satisfies the condition 
Vi G K, ( [(n^(x + t;9) - tt ( ^ (x; 9)) 2 dx\ ' < Lg, 



for some constant Lg > fi/iai may depend on 9) and some integer a > 3. 

Assumption 5. The kernel K is symmetric and continuously differentiable, has 
support [— 1, 1] and satisfies the conditions 

l ,i 

K(u)du = l, / u l K(u)du = 0, 1 = 1,..., a. 
l J-i 

Here a is the same as in Assumption^ 

Assumption 6. The bandwidth h = h n depends on n and h J, as n — > oo in such 
a way that nh 4 — > oo. 

Assumption 7. The weight function w is nonnegative, continuously differentiable. 
bounded and integrable. 

Assumption 8. The invariant density 7r(-; •) solves the differential equation 

(6) n(x;9 )Tr(x)-^[a 2 (x)Tr(x)]' = Q, 

where tt(-) is the unknown function. Differentiability of a(-) is also assumed. 

Assumption 9. The drift coefficient ■) is three times differentiable with respect 
to 9. The drift and dispersion coefficients and the corresponding derivatives are con- 
tinuous functions of x and 9. Furthermore, there exist functions Jlj(-),j = 1, ... ,4, 
such that 

W 

(7) sup | H (x;9)\ < J2 i+ i (x) , VieR, 
dee 

for i = 0, 1, 2, 3, and a function /IsOj such that 

sup \fi' (x; 9)\ <Jl 5 (x), VxeR. 
eee 

(») (0) 

Here A* denotes the ith derivative of a function /i with respect to 9 and M (•; •) = 
/«(•;•)• Moreover, the functions 

£?(>(•), a 4 (>(.), a 2 (-)(a'(-)) 2 W (-), 
fj, 2 (-)Jii{-)w(-), fi 5 (-)a 2 (-)w(-), i%(-)w(-), 

fi3(-)Jtl(-)w(-), /l 3 (-)0- 4 (-)^(-): ^3(-)^l('M-)> 

MM-Wi-M-), feCVOO, &(•)&(>(•), 

ju 4 (-)/Ii (•)«;(•), Ji4(-)o-(-)a'(-)w(-), Jia{-)o- 2 {-)w{-), 

M-)o- 2 {-)w{-), MM-y(-M-), M-)^(X-) 

are bounded and integrable. Finally, limui^oo ]l2(x)w(x)a 2 (x) = 0. 
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3. Remarks on assumptions 



In this section we provide remarks on the assumptions made in Section [2] 

Remark 1. In Assumption [T] we assume that the parameter 9 is univariate. This 
assumption is made for simplicity of the proofs only and the results of the paper 
can also be generalised to the case when 9 is multivariate. Compactness of the 
parameter space guarantees existence of our estimator 9 n . 

Remark 2. In this remark we deal with Assumption [2l A standard condition that 
guarantees existence and uniqueness of a strong solution to jl) is a Lipschitz and 
linear growth condition on the coefficients and cr(-) together with an as- 



sumpt ion that E [£ 2 ] < oo, see e.g. Theorem 1 on p. 40 in Gikhman and Skorokhod 
(1968) or Theorem 2.9 on p. 289 in Karatzas and Shreve (| 1998^ . The same condi- 
tion also implies that X will be a Markov process, see e.g. Theorem 1 on p. 66 in 
Gikhman and Skorokhodj (19681). time- homogen eity of which can be shown as on 
pp. 106-107 in Gikhman and Skorokhodl (Il968l). Moreover, X will be a diffusion 



process, see Theorem 2 on p. 67 in I Gikhman and Skorokhod (Il968h . Conditions 



for ergodicity of X and existence of the invarian t density are given e.g. in Theorem 
3 on p. 143 in iGikhman and Skorokhod ( 1968f ). while those for existence of the 
transition dens ity p(A, x, y; 9), as well as its ch aracterisations can be found in §13 
of Chapter 3 of Gikhman and Skorokhodl (1968). Ergodicity is a standard assump- 
tion in parameter estimation problems for diffusion processes from discrete time 
observations, at least in the problems with A fixed. The condition in Assumption^ 
that the support of 7r(-; 0) for every 9 € 8 equals K is a purely technical one and is 
needed only in order to avoid extra technicalities when dealing with boundary bias 
effects characteristic of kernel density estimators. This condition is for instance 
satisfied in case when the process X is an Ornstein-Uhlenbeck process, 



(8) 



dX t = -9X t dt + adW t , 



with 9 > and known a, because in this case ir(x; 9) is a normal density with mean 
and variance a 1 j (29), see Proposition 5.1 on p. 219 in iKarlin and Tavlorl (|198lf ) 
or Example 4 on p. 221 there. For more information on t he Or nstein-Uhlenbeck 
process see Example 6.8 on p. 358 in Karatzas and Shrevd ( 19981) or re s ults o n the 
Ornstein-Uhlembeck process scattered throughout IKarlin and Tavlor (1981). In 
the financial literature a slight generalisation of the Ornstein-Uhlenbeck process is 
used to model the dynamics of the short interest rate and the corresponding model 
is known under the name of the Vasicek model, see for instance pp. 350-355 in 
Musiela and Rutkowskil (|2005b . A general case when the support of %(■; 8) does not 
coincide with M as for instance for the CIR process, where it is equal to (0, oo), can 
be dealt with using the same approach as in the present work in combination with 
a b oundary bias c orrect ion method that uses a kernel with special properties, see 
e.g. iGasser et al. I (Il985l) . An alternative in the case when the state space of X is 
(0, oo) is to use the transformation Y t — log X t . The process Y will have the state 
space R and its governing stochastic differential equation can be obtained through 
Ito's formula. □ 

Remark 3. Assumption [3] implies certain restrictions on the rate of decay of a- 
mixing coefficients a&(k). Conditions yielding information on their rate of decay 
can be obtained for instance from the corresponding results for /3-mixing coefficients 
(3(s) for the process X. A /3-mixing coefficient /3(s) (attributed to Kolmogorov 
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in Kozanov and Volkonskrl (Il95flh and alternatively called the absolute regularity 
coefficient) for the process X is defined as follows, 

/3(s) =su P E [e8SBup B69 |P(B|y<t)-P(S)| 
t>o L 

where g> s+t = er(V u ,u > s + t), $< t = cr(X u ,u < t) and P(-|S r < t ) is the regular 
conditional probability on 5>i+s gi ven fi<t (the latter w i ll exi st in our context 
by Theorem 3.19 o n pp. 307-308 in Karatzas and Shreve ( 19981 )). Theorem 1 in 
Veretennikovl ( 1997t ) gives a sufficient condition on the drift coefficient (satisfied for 
instance in the case of the Ornstein-Uhlenbeck process), which entails a bound 

(9) 0(s) < C-, ' 



(1 



\K + 1 ■ 



where C is a constant independent of s and k depends in a simple way on the 
drift coefficient. An a-mixing coefficient a(s) (introduced in Rosenblatt ( 19561 )) is 
denned as 

a(s) = sup sup |P(AB) - P(A)¥(B)\. 

The following i nequa lity is well-known: 2a(s) < f3(s), see Proposition 1 on p. 
4 m iDoukhanl fll994h . Since one trivially has a&{k) < a(fcA), it follows that 
a\{k) < (l/2)/3(fcA). Therefore, by ^ in this case J2kLo aA (k) < °°> i- e - the 
requirement in Assumption [3] will hold. □ 

Remark 4. This remark deals with Assumption 21 Viewing 9 as fixed, conditions 
under which the invariant densit y tt(x; 9) is infinitely d i fferen tiable with respect to 
x can be found in Theorem 3 of Kusuoka and Yoshida ( 2000| ). In simple cases like 
that of the Ornstein-Uhlenbeck ([8]) , the regularity assumptions can and have to be 
checked by a direct calculation. The requirement that a > 3 is needed in order to 
establish Theorem[5] Under Assumption 2] the stationary density tt(-;8) belongs to 
the Ni kol'ski class of functions J£(a, L) as defined e.g. in Definition 1.4 in iTsvbakov 
(|2009h . Another possibility is to assume that the invariant density 7r(-; 9) is a times 
differentiable with continuous, bounded and square integrable d erivative of order a, 
se e e.g . paragraph VI. 4 on p. 79 and Theorem VI. 5 on p. 80 in lBosq and Lecoutrel 



(1987). In case the weight function w has a compact support, Lemma [T] (which is 
a basic result used in the proofs of the main statements of the paper) can also be 
proved under the assumption 

Vx,i€R, \TT ( - a \x + t;9)-^ a \x;9)\ < L g , 

i.e. the assumption that the densi ty tt(-;9) b elong s to the Holder class £(a, L) 
as denned e.g. in Definition 1.2 in iTsvbakovl ( 20091 ). However, if w has compact 
support, in our analysis we will not be using all the information supplied by the 
stationary density. This might require stronger conditions on the drift and disper- 
sion coefficients •) and cr(-) in order for the identifiability condition (fT4|) in the 
statement of Theorem [2] hold true and hence it is preferable to keep w general. □ 

Rem ark 5. Assumptio n [5] is a standard condition in kernel estimation, see e.g. p. 
13 in ITsvbakovl (|2009l) . The kernel K satisfying Assumption [5] i s called a kern el of 
order a. For a method of its construction see Section 1.2.2 in ITsvbakovl (|2009l) . □ 

Remark 6. Assumption [5] is needed in order to establish consistency of the estima- 
tors 7r(-) and tt'(-), see Lemma[T] □ 
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Remark 7. This remark deals with Assumption [71 In practice when implementing 
the estimator of 9 n , one would typically use w with compact support. See Section 
[7j for details. □ 

Rem ark 8. Sufficie nt conditions guaranteeing ([6]) in Assumption [8] can be gleaned 
from Banon (1978), see Lemma 3.2 there, and involve regularity conditions on the 
drift coefficient •) and dispersion coefficient cr(-). Note that for simple cases 
like the Ornstein-Uhlenbeck process ([8]), where an explicit formula for the invariant 
density is available, Assumption [8] can also be verified directly. □ 

Remark 9. Conditions on the drift and dispersion coefficients made in Assumption 
|9] are used to prove asymptotic results of the paper. With an appropriate choice 
of the weight function w(-) they are satisfied in a number of interesting examples, 
for instance in the case of the Ornstein-Uhlenbeck process © with 9 > unknown 
and a - known. Examination of the proofs shows that complicated conditions in 
Assumption[9]can be significantly simplified if the weight function w is taken to have 
a compact support. Note also that because of a great flexibility in selection of the 
weight function w, Assumption|9]will be satisfied in a large number of examples. □ 

4. Consistency 

Let K be a kernel function and a number h > (that depends on n) be a band- 
width. To construct our estimator of 9q, we first need to construct a nonparametric 
estimator of the stationary density tt(-;9q). The stationary density tt(-;9o) will be 
estimated by a kernel density estimator 



Tr(a;) 



1 



(n - 



1 

> j=0 



K 



x 



while 7r'(-) will serve as an estimator of ir'(-;9o) (we assume that K(-) is diffcr- 
cntiable). Kernel density estimators are among the most popular nonparametric 
density estimators, see e.g. Chapter 1 in iTsvbakovl d2009h for an introduction in 
the i.i.d. case and Section 2 in Chapter 4 of Gvorfi et al.l (|l990t l for the case of 
dependent identically distributed observations. 

In the sequel we will need to know the convergence rate of the estimator 7r(-) 
and its derivative 7r'(-) in the weighted L2-norm with weight function w. As usual 
in nonparametric density estimation, to that end some degree of smoothness of 
the stationary density 7r(-; •), as well as appropriate conditions on the kernel K 1 
bandwidth h and weight function w are needed. These are supplied in Section [5] 
Furthermore, to establish useful asymptotic properties of the estimators tt(-) and its 
derivative 7r'(-)) some further assumptions on the observations have to be made. We 
will assume that the sequence Zj — XjA is strongly mixing with mixing coefficients 
satisfying a condition spelled out in Section [2] 

The following result holds true. 

Lemma 1. Under Assumptions]]^^ we have 



(10) 
and 
(11) 



E 



E 



(•7r(a;) — ir(x;9o)) 2 w(x)dx 
(tt' (x) — it 1 (x; 9)) 2 w(x)dx 



<h 2a 



ih 2 ' 



< ft 2(«-D + _L 

nh 4 
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Remark 10. The bound in inequality (|10[) . and by extension in inequality (|11[) . can 
be sharpened by usin g more ref i ned ar guments in the proof of Lemma [TJ such as 
Theorem 3 on p. 9 in iDoukhanl (|1994f) . However, the 'usual' order bound on the 



mean integrated squared error in kernel density estimation for i.i.d. observations 
when the unknown density is 'smooth of order a\ i.e. 

1 



(12) 



E 



(tt(x) — 7r(0, x)) w(x)dx 



< h 2a + 

nn 



see e.g. Theorem 1.3 in iTsvbakovl (l2009h . does not seem to be obtainable without 
furthe r conditions. Fo r dependent observations the bound ([T2"f is true by Theorem 
6.6 m IVienned (|l997l ). which, however, is proved under /3-mixing assumption on 
observations (which is stro nger than q-mixing) and some ex tra c ondition on the 0- 
mixing coefficients (see also Gourieroux and Tenreirol d2001 ) and Kristensen ( 2011 ) 
for related results). The proof of a similar result in Vieul ( 1991 ) under a- mixing as- 
sumption and some complicated conditions on the mixing coefficients, see Theorem 
2.2 there, is unfortunately incorrect: the assumption (2.3b) in that paper is impos- 
sible to satisfy unless the observations are independent, formula (A. 9) contains a 
mistake and formula (9.2) requires some further conditions in order to hold. □ 

Let the estimator 9 n of 9o be defined by ([3]) . 

Remark 11. Under our assumptions in Section [2] the criterion function R n (0) from 
(]4| is a continuous function of 9 and hence by compactness of 6 a minimiser of 
R n (0) over 9 £ O exists. Consequently, so does the estimator 9 n , although it might 
be non-unique. Moreover, the estimator 9 n will be a measurable function of the 
observations Zo, Z\, . . . , Z n and hence when dealing with convergence properties 
of 9 n , the use of outer probability, will not be needed. Observe that 9 n , being 
defined through a m inimisation procedure, is an M-estimator, see e.g. Chapter 5 in 
van der Vaartf (j2000l ). □ 



Remark 12. An approach to parameter estimation for stochastic differential equa- 
tions that is based on estimating equations as described in Section Q] in practice 
might suffer from non-uniqueness of a parameter estimate, i.e. non-uniqueness of a 
root of the estimating equations. 'Wrong' selection of a root of the estimating equa- 
tions might ev e n ren der the estimator inconsistent, see e.g. remarks on pp. 70-71 in 
van der VaartJ (|2000h . For a t horough discussion of the multiple root problems and 
possible remedies for them see ISmall et al. (2000). On the other hand, an approach 
based on maximisation of a criterion function, such as the one advocated in the 
present work, is less prone to failures of this type. □ 

Remark 13. In many interesting models, in particular in those where the drift 
coefficient //(■; ■) is linear in 9, the estimator 9 n will have a simple expression. For 
instance, one can check that for the Ornstein-Uhlenbeck process ((8]) with 9 > 
unknown and and a = 1 known, the estimator 9 n of the true parameter value 9q is 
given by 

. 1 f R xn(x)Tr'(x)w(x)dx 

2 L x 2 Tr 2 (x)w(x)dx 

Compare this expression to the rath er comp l ex an d nonlinear score function for 
the same model as given on p. 77 in iKesslerl £2000), which is used as an estimat- 
ing function when 9 is estimated by the maximum likelihood method and which 
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requires use of some numerical root finding technique for the computation of the 
estimator. A general conclusion that can be drawn from this and other similar 
examples is that our approach in many interesting examples will provide explicit 
estimators. However, it should be noted that from the point of view of numerical 
stability, evaluation of the estimator through expressions such as in (fl3)) cannot 
be recommended in practice. Rather, one should approximate the criterion func- 
tion R n (-) through a Riemann sum and next compute from this approximation 
the estimator 9 n as a weighted least squares estimator. When //(•;•) is linear in 
9, the problem will further reduce to a standard task of computing the weighted 
least squares estimator in the linear regression model. Finally, we remark that with 
a proper implementation of the nonparametric kernel estimators 7r n (-) and Tt'„( 



compu tational effort for their evaluation is very modest; see e.g. iFan and Marron 



(Il994h . □ 

Remark 14. A d esire to have s imple expressions for estimators based on estimat- 
ing equations in Kessler ( 2000l ) at times leads to unnatural as sumptions on the 



parameter space 8. For instance, in Section 6.4 in iKessler (2000) in the model 



J dX t = -6X t dt + ^9 + X?dW t , 

in order to accommodate a simple looking estimator of the true parameter 9q, 
9q > 7/2 has to be assumed, while the more general condition 9q > appears to 
be more natural here. On the other hand, the assumption 9q > 7/2 is not needed 
for our estimator 9 n and 9q > suffices (this model formally does not fit into our 
framework, because the unknown parameter 9 is also included in the dispersion 
coefficient of the stochastic differential equation. However, our asymptotic analysis 
holds for this model as well). □ 

It can be expected that as n — > oo, for every 9 € the criterion function R n {9) 
converges in some appropriate sense to the limit criterion function 

R(9) = j (m(:e; 9)ir{x; 9 Q ) - 1 [a 2 (x)ic(x; O )] w(x)dx. 

Note that by our assumptions R(9 ) — and that R{9) > for 9 G 0. Hence the 
parameter value 9q is a minimiser of the asymptotic criterion function R(9) over 
9 G 0. Under suitable identifiability conditions it can be ensured that 9$ is the 
unique minimiser of R(9). Next, if convergence of R n (9) to R(9) is strong enough, 
a minimiser of R n {9) will converge to a minimiser of R(9). Said another way, 
9 n will be consistent for 9 . This is a standard approach to prove consistency of 



M-estimators, see e.g. Section 5.2 in Ivan der Vaartl (|2000l ). 

In order to carry out the above programme for the proof of consistency of 9 ni 
we need that the drift coefficient fj,(-; •) and the dispersion coefficient cr(-) satisfy 
certain regularity conditions. These are listed in Section [5J Then the following 
theorem holds true. 

Theorem 1. Under Assumptions^^ and the additional identifiability condition 
(14) Ve > 0, inf R<9) > R(9 ), 

0:|0-6>o|>e 
p 

the estimator 9 n is weakly consistent: 9 n — > 9q. 
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Remark 15. The identifiab ility condition fj 141) is standard in M-estimation, see e.g. a 



discussion in Section 5.2 in lvan der Vaartj f|2000h . It means that a point of minimum 
of the asymptotic criterion function is a well-separated point of minimum. Since 
under our conditions the asymptotic criterion function R(9) is a continuous function 
of 6 and is compact, uniquene ss of a global minimi ser of R(9) over 9 will imply 
(|14|) . cf. Problem 5.27 on p. 84 in lvan der Vaartl (2000). As one particular example, 
one can check that condition (fl4| is satisfied for the Ornstein-Uhlenbeck process 
(|5|), assuming that 9 is unknown, while a is known. □ 

Theorem 2. Let the assumptions of Theorem]]] hold and let additionally 9q be an 
interior point of . Ifh^n^ 1 with 7 = l/(2a) and R(9q) 7^ 0, then ^fn(9 n — 9o) = 

OKI). 

Remark 16. The assumption R(9q) 7^ is satisfied in a number of important ex- 
amples, for instance in the case of the Ornstein-Uhlenbeck process (|5|) with 9 > 
unknown and known a. □ 

Remark 17. Under appropriate conditions, by the same method as studied in the 
present work, one can also handle the case when the drift coefficient /J,(-;9) does 
not depend on parameter 9, while the dispersion coefficient does. □ 

Remark 18. In the present paper we assumed that the dispersion coefficient a(-) 
was known. In practice this is not always a realistic assumption. A possible ex- 
tension of the smooth and match method to this more general setting is to assume 
that cr(-) is a totally unknown function, to estimate it nonparametrically and next 
to define an estimator of the parameter of interest 9q again via an expression ([3]) , 
but with ct(-) replaced by its nonparametric estimator er(-) in R n {9). Under appro- 
priate assumptions this approach should again yield a -y^-consistent estimator of 
9q, although some nontrivial technicalities can be anticipated. □ 

Remark 19. Theorem [5] holds also for bandwidth sequences h x n~~* with 7 other 
than l/(2a). However, 7 cannot be arbitrary, for this might lead to violation of con- 
sistency of 7r(-) and 7r'(-), see Lemma[T] The condition on the bandwidth sequence in 
the statement of Theorem [5] is of an asymptotic nature and is not directly applicable 
in practice. In practical applications a simple method called the quasi-optim ality 



method is likelily to produce reasonable results, see e.g. iBauer and Reifil (|2008l ) for 
more information. See also the results of the simulation examples considered in 
Section [3 □ 

5. Asymptotic normality 

Examination of the proof of Lemma 0] in Appendix [A] on which the proof of 
Lemma [3J and eventually that of Theorem [5] relies, shows that under appropriate 
extra conditions not only -^-consistency of the estimator 9 n , but also its asymp- 
totic normality can be established. 

Let 

v(x) = 2fi(x;9o)n(x;9o)n(x;9o)w(x) + [jjt{x\ 9 )n(x; 9 )w(x)}' a 2 (x). 
The following result holds true. 

Theorem 3. Let the assumptions of Theorem]]] hold (with As sumption^ strength- 
ened to the requirement a > A) and let additionally 9q be an interior point of 0. 
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Assume that h x. n 7 with 



V 



< 7 < -. 

2a ' 8 



(15) R(6 )^0, Yax[v(Z )] + 2j2Cov[v{Z Q ),v(Z j )}>0, H^IU < oo, 

i=i 

and for some S > 0, 



(16) E[\Z 3 \ 2+S ] <oo, ]T(a A (fc)) d/,(2+5) < 



fe=i 



v^TT^n-eo)^^^ 2 ). 



Here 



2 _ Var Kgp)] + 2 gg^ Cov [v(Z Q ), v(Z 3 )] 

6. One-step Newton-Raphson type procedure 

Although according to Theorems [2] and [3] the estimator 9 n is -^-consistent and 
even asymptotically normal, it is obviously not necessarily asymptotically the best 
one, which in the present model and observation scheme is typically the case for 
Z-estimators based on martingale estimating equations as well. Here we interpret 
asymptotically the best estimator as the one that is regular and has the small- 
es t possible asymptotic variance among all regular estimators, see e.g. Chapter 8 
in van der Vaard f|2000h for an exposition of the asymptotic efficiency theory in 



the i.i.d. setting. Under regularity conditions the maximum likelihood estimator 
achieves the efficiency bound. As far as Z-estimators in diffusion models are con- 
cerned, a line of research in the literature is to try to choose estimating equation s 
within a c ertain clas s of func t ions i n an optimal way, see e.g. Bibbv et al. (2010), 



Jacobsenl (|200ll ) and iKesslerl (|2000h . However, most of the work in this direction 



deals with the high frequency data setting where A = A„ —¥ as n — > oo. In our 
case optimal choice of the estimating equations would correspond to the problem 
of an optimal choice of the weight function w(-) within a certain class of weight 
functions. This is not an easy problem to solve and it is a priori not clear whether 
this approach would lead to a simple and feasible optimal weight function w op t- A 
possibly better and more direct approach to improving asymptotic performance of 
the estimator 9 n is to use it as a starting point of a one-step Newton -Rapshon type 
proced ure. The idea is well-known in statistics, see e.g. Section 5.7 in Ivan der Vaartl 
(2000), and is as follows: consider an estimating equation ^f n (6) = 0. Given a pre- 



liminary estimator 9 n , define a one-step estimator 9 n of 9o as a solution in 9 to the 
equation 

(17) *n(0n) + *n(0n)(0-0n) = O, 

where x i , „(-) is the derivative of \I , n (-) with respect to 9. This corresponds to re- 
placing ty n (9) with its tangent at 9 n and when iterated several times, each time 
using as a new starting point the previously found solution to (fT7| . is known in 
numerical analysis under the name of the Newton (or Newton-Raphson) method, 
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see e.g. Section 2.3 in iBurden and Fairesl (|2000h . This method is used to find ze- 
roes of nonlinear equations. In statistics, on the other hand, just one such iteration 
suffices to obtain an estimator that is as good asymptotically as the one defined by 
the estimating equation ^ n {9) = 0, provided the preliminary estim ator is already 
■ ^/n-co nsistent (a precise result can be found in Theorem 5.45 in van der VaartJ 
(2000)). A computational advantage of a one-step approach over a more direct 
maximum likelihood approach is that often a preliminary -y/n-consistent estimator 
is easy to compute, while the computational time required for one Newton-Raphson 
type iteration step is negligible. 

Under suitable conditions one can use in th e capacity of ^ n the martingale 
estimating functions, see e.g. Bibbv et al. ( 2010l) . or even the score function S n (9) 
(i.e. a gradient of the likelihood function with respect to the unknown parameter 
9), provided the required derivatives of ^ n can be evaluated either analytically or 
numerically in a quick and numerically stable way. The estimator 9 n can thus be 
upgraded to an asymptotically efficient one. We omit a detailed discussion and a 
precise statement to save space and will simply note that the regularity conditions 
required to justify the one-step method are mild enough in our case (as an example, 
they are satisfied in the case of the Ornstein-Uhlenbeck process). 



7. Simulations 



In this section we present results of a small simulation study that we performed 
using the Ornstein-Uhlenbeck process ([5} as a test model. This study is in no way 
exhaustive and the results obtained merely serve as an illustration of the theoretical 
results from Sections HHH 

Three required components for the construction of our estimator 9 n from ([3]) are 
the weight function u>(-), the kernel K and the bandwidth h. As a weight function 
we used a suitably rescaled version of the function 



1, if |x| < c, 

exp[-/3exp[-/3/(|x| - cf]/(\x\ - l) 2 ], if c < \t\ < I, 
0, ifH>l, 



with constants c and (3 equal to 0.7 and .5, respectively. This weight fu nction 
was already used in simulation examples in Gugushvili and Klaassenl ( 2012 ). The 
rationale for its use is simple: w will be equal to one on a greater part of its support, 
which comes in handy in computations, while at the same time being smooth. As 
a kernel we used 



K(x) 



/ 105 
l~64~ 



315 
"64~ ; 



(1-a; ) l[|x|<i]> 



which was also employed in simulation examples in lGugushvili and Klaassenl (|2012l ) 
and yielded good results there. Finally, in all our examples the bandwidth was se- 
lected through the so-called quasi-optimality approach by computing the estimates 
9 n = 9n,h for a range of different band widths h and then picking the one that 
brought the least change to the next estimate. In greater detail, for a sequence of 
bandwidths {h^} we chose the bandwidth h such that 



h = ar 



gmm fet «, \\9 nM ,+i 
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Table 1. Mean squared errors for the estimates O n ,0n,0 n ,9 n to- 
gether with the optimal value EB obtained from the asymptotic 
efficiency bound in the case of the Ornstein-Uhlenbeck process ((8|) 
with 9q = 2 and a = 1. 



A 


n 


n 


n 


n 


On 


EB 


0.01 


99 


1.900 


11.24 


8.545 


11.28 


4.001 




199 


2.152 


3.774 


3.474 


3.776 


2.000 


0.05 


99 


1.061 


1.384 


1.371 


1.394 


0.803 




199 


0.578 


0.647 


0.615 


0.651 


0.401 


0.1 


99 


0.663 


0.697 


0.677 


0.701 


0.405 




199 


0.291 


0.204 


0.206 


0.205 


0.203 


1 


99 


0.155 


0.067 


0.079 


0.070 


0.080 




199 


0.093 


0.040 


0.042 


0.040 


0.040 



and next computed the estimate n ~ h . In order not to clutter the notation, in 
the sequel we will omit the dependence of ? on h and will simply write n . 
Bauer and Reifil ( 20081 ) contains theoretical justification for this method of smooth- 



ing parameter selection in nonparametric estimation problems. 

Our goal was to compare the behaviour of our estimator 9 n , the one-step es- 
timator 9 n which was using 9 n as a preliminary est imator, t he es timator based 
on a simple estimating function from formula (29) in Kessler ( 2000h given by the 
expression 



9 v^n-l vl ' 

and the maximum likelihood estimator 9 n - Since the practical performance of the 
maximum likelihood estimator 9 n in the case of the Ornstein-Uhlenbeck process is 
quite good, while the loss in asymptotic efficiency for the estimator (9* in comparison 
to 9 n is small, the competition with these two estimators was a tough task for our 
estimator 9 n . 



All the computations were performed in Wolfram Mathematica 8.0, see lWolfram Research. Inc 



(l2010h . Simulating samples from the Ornstein-Uhlenbeck process is straightforward, 



since it is an AR(1) process. We took 9$ — 2 and a = 1 and simulated from the pro- 
cess X samples of sizes 100 and 200 (thus n = 99 and 199) with intervals between 
successive observations A = 0.01,0.05,0.1 and 1. 

As a criterion for comparison of different estimators the mean squared error 
was used. For fixed A and n and for k — 200 different samples we computed the 
estimates 9 ni 9 n ,9 n and 9 n and then for each of k = 200 estimates 9 n ,9 n ,9 n ,9 n 
we evaluated the corresponding mean squared error, that is the sum of the sample 
variance and sample bias squared (sample mean minus the true parameter value 
9 = 2 squared). The support of the weight function w(-) was taken to be the 
interval [—1.4, 1.4], which roughly corresponds to the interval [— 3s n , 3s n ], where s n 
is the sample standard deviation of the observations. The results obtained from our 
simulations are reported in Table [TJ where we also included the theoretical optimal 
value EB for the mean squared error that can be o btained from t he asymptotic 
efficiency bound, see Example 3.2 and formula (12) in Kess ler (2000). A conclusion 
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(modulo the Monte Carlo simulation errors) that lends itself from this table is that 
for small A the estimator 9 n seems to either outperform other estimators, or to 
perform just as well as other estimators, but once A and n are sufficiently large, 
it is itself outperformed by other estimators (our conclusions are also supported 
by some other simulations not reported here). Curiously enough, for A = 0.01 
and n = 99 the estimator 9 n beats the asymptotic efficiency bound, although of 
course its performance is not (and cannot be) particularly good in this case. It 
is also interesting to note that the maximum likelihood estimator is not the best 
estimator in all the cases, which should not be surprising, for its superiority over 
other estimators is in the asy mptotic sense on l y (it is also known to be strongly 
biased for small raA, see e.g. iTang and Chen Note that whenever the 

maximum likelihood estimator 9 n performs well, so does the one-step estimator 9 ni 
which in general seems to yield virtually indistinguishable results. Another general 
remark is that for n fixed all the estimators tend to perform better for larger values 
of A. An intuitive explanation of this fact is that increasing A decreases the degree 
of dependence between different observations, which coupled with the fact that in 
the case of the Ornstein-Uhlenbeck process the marginal distributions of the process 
X contain enough information on the parameter 9o, improves the estimation quality. 

In conclusion, keeping in mind that in our simulation study we used a very simple 
bandwidth selector and a weight function u>(-), the choice of which was primarily 
motivated by simplicity considerations, the performance of our estimator 9 n can be 
deemed as satisfactory. 



8. Proofs 

Proof of Lemma[l[ We will only prove (fTOj). as the proof of (fTTj) uses similar ar- 
guments. By a standard decomposition of the weighted mean integrated square 
error into the sum of the weighted integrated square bias and weighted integrated 
variance we have 



E 



(tt(x) — ir(x; 9o)) 2 w(x)dx 



(18) 



= / (E [tt(x)} ~ tt(x; 9 )) 2 w(x)dx 



[tc(x) -E[n(x)]) 2 w(x)dx 



■ E 



Ti < HI 



= T 1 +T 2 . 

By assumptions of the lemma combined with Proposition 1.5 in Tsvbakov ( 20091 ) 
it holds that 

(19) 

Next, denote 
(20) 
Then 



^jf |uHtf(tO|du) h 2 



T 2 = 



(n+1) 2 



j^fe^i'^l w ^ dx 



1(5 
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\ n + L ) j=0 Urn. 
+ {n l l)2 J2 E J^Y(Z i ,x)Y(Z j ,x)w(x)dx 



1 



-E 



71 + 1 

+ r^-luY, E I Y(Z i ,x)Y(Z ji x)w(x)da 
(n + l)^ ^ [7 R 



Y 2 (Zi, x)w(x)dx 



= T 3 + T 4 

holds. By Proposition 1.4 in iTsvbakovl (|2009t ) we have 



(21) 

Now note that 



(n + l)h l 



K 2 (u)du. 



|F(-,-)|| o<2||^|| c 



Consequently, by Lemma 3 on p. 10 in iDoukhanl (|1994I ). 



\E[Y(Z i ,x)Y(Z j) x)}\ < 16||A-||2o^aA(|i-i| 



Thus 



(22) 



^^i6||x|| 2 j|HliE^(K-il) 



0<i<j<n 

Working out the sum on the right-hand side, we get 

n 

^2 a A(j ~i) =^2{n + l- k)a A (k) 

0<i<j<n k=l 



< (n + l)j^a A (fc), 



k=l 

which can be seen by counting the corresponding possibilities and the trivial obser- 
vation that n+ 1 — k < n + 1 for k = 1, . . . , n. Note that the sum on the right-hand 
side of the last display is hnitc by Assumption [3] The above display, the fact that 
T 2 = T 3 + T 4 and the bounds and ||22J) imply that 

( 23 ) T 2 < 4l- 

The statement (fTU|) follows from decomposition (fT5)l combined with formulae (flT))) 
and (|23p . In view of the remark made at the beginning of the proof, this completes 
the proof of the lemma. □ 

Pro of of Theo r em [H We first settle the issue of measurability of 9 n . By Lemma 
2 in Ijennrichl (|l969l) to that end it is enough to have that for each fixed the 
criterion function R n {&) is a measurable function of the sample Zq, . . . , Z n , and 
that for (Zq, . . . , Z n ) e M™ +1 viewed as fixed, the function R n {&) is continuous in 
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9. However, measurability follows easily from our assumptions, while continuity of 
R n (9) i n 9 is a consequence of the fact that under our conditions by the corollary on 
p. 74 in IWhittaker and Watson ( 20061 ) and by de la Vallee Poussin's test on p. 72 
there the function R n {9) is in fact three times differentiable with respect to 9 (this 
follows by a tedious but easy verification of the assumptions made in the corollary 
on p. 74 in IWhittaker and Watsonl (|2006h ). Thus in the convergence considerations 
we do not need to appeal to outer probability. 
We will prove that 



(24) 



sup \Rn(0)- R(9)\ 4 0. 
eee 



The statement of t he theorem wi l l then follow from this fact and as sumption (ITU) 
by Th eorem 5.7 in van der Vaart ( 2000l ) (the fact that Chapter 5 in van der Vaard 
(2000) largely deals with the i.i.d. setting is immaterial in this case). 
By the Cauchy-Schwarz inequality we have 

\R„(9)-R(9)\ 

f 1 1 

fj,(x;9)n(x) - - [a 2 {x)n(x)] ' - n(x;9)n(x;9o) + - [<j 2 (x)tt(x; 9 )] ' 



x (fi(x; 6)tt(x) - i [a 2 {x)w(x)]' + fi(x;9)Tr(x;d ) [cr 2 (a;)7r(x)] ' 



x w(x)dx 



H(x;9)(n(x) - Tr(x;6 )) - ~ [a 2 {x)(7r(x) - n(x; 9 ))]j w(x)dx 




1 A 

fi(x; 9)(tt(x) + tt(x; 6q)) - - [(J 2 (x)(tt(x) + tt(x; 9 ))] J w(x)da 



1/2 



1/2 



with obvious definitions of Ti(8) and T2{9). This inequality and Lemma [2] from 
Appendix |A] then yield (|24|) . which in view of the remarks we made at the beginning 
of this proof completes the proof of the theorem. □ 

Proof of Theorem [H Introduce the set 

(25) G n , e = {\9 n - O | < e}, 

where e > is some fixed number. Since 9q is an interior point of 0, by choosing 
e small enough one can achieve that on the set G n the estimator 9 n belongs to the 
interior of too. By the fact that 9 n is a point of minimum of R n {9) it then follows 
that 1g„ e Rn{8n) = 0. From this and from the mean-value theorem we have 



1. 



Rn{e n + x(e - e n ))d\(e 
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The statement of the theorem follows by multiplication of the leftmost and right- 
most terms of the above equality with s/n and application of Lemmas [3] and [5] from 
Appendix [£] □ 

Proof of Theorem^ From the proofs of Theorem [2] and Lemmas [3H5] from Appen- 
dix|A](note that our as sumptions on h and 7 are also used here), as well as Slutsky's 
lemma (Lemma 2.8 in Ivan der Vaartl (2000)) it follows that in order to establish 
the theorem, it is sufficient to establish asymptotic normality of 



y/n + l / v(x)(Tr n (x) — n(x; 9o))dx = \Jn + 1 / v(x)(E[K n (x)] — ir(x;8o))dx 



+ y/n+1 / v(x)(n n (x) — E [k n (x)})dx. 



By a standard argument, cf. the proof of Proposition 1.2 in iTsvbakov and 
by our assumption on h, the first term on the right-hand side of the above display 
converges to zero. As far as the second term is concerned, by a change of the 
integration variable to u = (x — Zj)/h and a simple rearrangement of the terms it 
can be rewritten as 



1 - 

= ^W)-ek^)]} 
1 " r 1 

+ ~7=f E / + hu ) ~ v(Z,)}K(u)dv 

y/n+l ~— W-i 



J=0 



3=0' 



Vn+ IE 



{v(Zj + hu) - v{Z 3 )}K{u)du 



We want to show that the last two terms on the right-hand side of the above display 
vanish in probability as n — > 00. By Chebyshev's inequality it is sufficient to prove 
that 

nl 



Vn + IE 



{v(Zj + hu) — v(Zj)}K(u)du 



= o(l). 



This, how ever, can be done through a standard argument (cf. the proof of Proposi- 
tion 1.2 in Tsvbakov ( 2009h ) by expanding v(Zj + hu) into the Taylor polynomial of 
order a and next using the fact that K is a kernel of order a, which yields that the 
left-hand side of the above display is of ord e r n 1/,2 fe" = o(l). On the other hand, 
by Theorem 18.5.3 in llbragimov and Linnikl (J1965J), 



(n + 1) 



Var KZ )] + 2 ^ Cov ( V (Z )XZ,)) 



3=1 



-1/2 



x^MZ 3 )-EKZ,)]}^K(0,l) 

3=0 



Combination of the above results and Slutsky's lemma yield the statement of the 
theorem. □ 
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Appendix A. 

The present appendix contains a number of technical results used in the proofs 
of the main results of the paper in Section 01 

Lemma 2. Under the conditions of Theorem^ we have 

(26) supTi(0)=op(l) 



an 



d 



(27) su P T 2 (0) =0p(1), 

see 

where T\ (9) and T 2 (ff) are the same as in the proof of Theorem [7J 

Proof. We will only prove (1261) . because (|27p can be proved by similar arguments. 
By the C2-inequality and Assumption [§] we have 

supTi(0)< / (tt(x) - Tr(x;9 Q )) 2 Jl 2 1 (x)w(x)dx 



+ j ([a 2 (x)(Tr(x)-ir(x;6 ))]'y w(x)da 



eee 



A slight variation of Lemma [T] (with a suitable choice of the weight function uu(-) 
there) then shows that the right-hand side converges to zero in probability. This 
completes the proof of the lemma. □ 

Lemma 3. Under the conditions of Theorem^ we have 

1g„,sV^»(0o) = Pp(1), 

where the set G„ j£ is defined in (|25|) . 



Proof. Differentiating under the integral sign with respect to 9 the function R n {9), 
we obtain 



\fn1 j (fJ,(x; 9q)tt(x) — — [a 2 (x)n(x)\ ' j p,(x; 0o)n(x)w(x)dx. 



= 1g 

In view of Assumption [8] the right-hand side can be rewritten as 

lG ne 2 V" / K x i & o)^(x)w(x)(n(x;9 )(7r{x) - tt(x;9 )) 
Jr v 

-^[a 2 (x)(7r(x)-n(x;9 ))]')dx 

= l Gnc 2^/n / fi(x;9 )TT(x;9o)w{x)n(x;9o)(rr{x) - Tr(x;9 ))dx 
Jr 

-lG„, e V" / K x ;9o)ir(x;9 )w(x) [a 2 (x)(n(x) - tt(x;9 ))] dx 
Jr 

+ l GnE 2y/n / p,(x;9 o )n(x;0 o )w(x)(n(x) - ir(x;9 )) 2 dx 
Jr 

-lG ne Vn / (i(x;9 )w(x)(n(x) -tt(x;6 )) [o- 2 (x)(n(x) - tt(x; 9 ))]' dx 
Jr 

= r 1 + T 2 + T 3 + r 4 . 
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By Lemma E] the terms T\,T2,T§ and T4 are Op(l). This completes the proof. □ 

Lemma 4. Let T±, T%, T 3 and T4 be defined as in the proof of Lemma\^ Then 
each of them is Op(l). 

Proof. We start by proving the statement of the lemma for T\ . We have 
(28) / /j,(9 ,x)n(x;8 )w(x)ii(x- 1 9o)(7r(x) - ir(x- 1 9 ))dx 

JR 

= I fi(x: 9o)n(x; 9o)w(x)fi(x; Oq)(E, [tt(x)] — n(x; 9o))dx 

9o)n(x; 9o)w(x)fi(x; 9o){Tr(x) — E {-k(x)])dx. 
By Proposition 1.2 in iTsvbakovl (l2009h it holds that 



(29) 



fi(x; 9q)tt(x; 9o)w(x)/j>(x; #o)(E [^(x)] — tt(x; 9o))dx 



where the last inequality follows from our assumption h x n -1 /*- 2 "). Next we will 
show that the second term on the right-hand side of ([28]) is Op(n -1 / 2 ). To that end 
it suffices to show that 



(30) Vn / (tt(x) - E [Tr(x)])v(x)dx = O p (1) 

Jr 

for a function v such that ||v||oo < 00, because by Assumptions [2] and [9] 

IIA(-;0oM-;0o)w(-)/-t(-;0o)IU < °°- 

By Chebyshev's inequality, the fact that Zj 's are identically distributed and the fact 
that E[F(Zj,x)] = 0, where Y(Zj,x) is defined in (|2"0"|) . for an arbitrary constant 
C we have 



(31) P ( y/n I (n(x) — E [k(x)])v(x)dx > C 

(tt(x) — E [k(x)])v(x)dx 

1 1 



n r 
< ^Var 



< 



C 2 n + 1 



Var 



1 



It n 

/ Y(Z 3l x)v(x)da 
j=o Jr 

Y(Zi, x)v(x)dx 



+ 



2 1 

C 2 n + 



- V E / Y(Zi,x)v(x)dx / Y(Zj,x)v(x)dx 

1 n^,^„^„ Us. Jr 



0<i<j<n 

By a change of the integration variable it can be shown that 



(32) 

which implies that 
(33) 



Y(Zi, x)v(x)dx 



<2|MU|iqi, 



c 2 



Var 



Y(Zi, x)v(x)dx 



< 



m\UK\\\ 

C 2 
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Furthermore, using (|32|) we get for i < j from Lemma 3 on p. 10 in lDoukhanl ( 1994 ) 
that 



E 



<16||iCpq;a A (j-i). 



Y(Zi,x)v(x)dx / Y(Zj , x)v(x)dx 
Jr 

By counting the cases when j — i = k for k = 1, . . . , n, it can be seen that 

— - V E / Y(Z u x)v(x)dx [ Y(Zj,x)v(x)dx 
+ ln^,^„ Urn, Jr 



C 2 n 



0<i<j<n 



< 



32 1 



C 2 n 



j\\v\\UK\\iJ2( n + 1 - k ^W 



k=l 



< 



1 00 

^32||«Ojq;5>A(fc). 



c , 2 



fe=l 



The finiteness of the sum in the rightmost term of the above display is guaranteed 
by Assumption [3] The above display and (|3"3"j) show that the left-hand side of (|3"Tj) 
can be made arbitrarily small by selecting C large, which shows that f|30[) holds. 
Formulae ([28 j) -(|30 | then imply that T\ is Op(l). 

Next we treat By integration by parts and using Assumption [9l 



n / ^o) 7r (2 ; ; 9q)w(x)] a (x)(7t(x) — %(x; 6o))dx. 



T 2 = U 



The right-hand side can be treated by exactly the same arguments as used above 
for T\ and one can show that T 2 = Op(l). 
We move to T 3 . By Chebyshev's inequality 



P I \pn I fi(x; 9o)[i{8o, x)w(x)(tt(x) — n(x; 9o)) 2 dx > C 



\n{x; d )fi(x; 9 )\w(x)(rr(x) - Tr(9 ,x)) 2 dx 



By a slight variation of the statement of Lemma [T] (replace w(-) in the statement 
with j«2(-)/Ii (•)«;(•)) the right-hand side of the above display is o(l) and hence T3 
is op(l). 

Finally, T4 can be handled by the same argument as T3 employing the Cauchy- 
Schwarz inequality to see that 



fi(x; 8o)w(x)(n(x) — ir(x; 60)) [cr 2 (a;)(7r(x) — n(x; 60))] ' dx 
(fi 2 (x; 9o)) 2 w(x)(tt(x) — n(x; 9o)) 2 dx 



< 



1/2 



w(x)([a 2 (x)(Tr(x)-Tr(x;9 ))]' 



dx 



1/2 



Next the arguments similar to those given above together with Lemma Q] allow 
one to conclude that the right-hand side is Op(n -1 / 2 ) and hence T4 = Op(l). This 
completes the proof of the lemma. □ 
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Lemma 5. Under the conditions of Theorem\^we have 

i G „ . / Rnik + \(e - e n ))d\ 4 R(e ), 

Jo 

where the set G n>e is defined in (|25p. 
Proof. We have 

1 G „ . / Rn0n + A(^o ~ k))d\ = 1 G „ e Rn{6 ) 
Jo 

+ l Gn s J (k n {9 n + \(6 - §n)) - Rn{0 )) dX = Ti + T 2 . 

By Lemma [5] the term T% converges in probability to R(9o), while by Lemma[7]thc 
term T 2 converges in probability to zero. This completes the proof. □ 

Lemma 6. For T\ defined as in the proof of Lemma\5\ and under the same condi- 
Hons as in Lemma\Qwe have T\ — > R(9o). 

Proof. By consistency of 9 n , see Theorem [TJ we have 1g„ e ~~ ^ 1- Furthermore, 

(34) R n (9 ) = 2^ jj, 2 (x; 9 )n 2 (x)w(x)dx 

+ 2 / ji(x;9o)fJ.(x;9o)'K 2 (x)w(x)dx 
Jr 

[a 2 (x)ir(x)\ fi{x\9o) ; k{x)w(x)dx 



= A 1 +A 2 +A 3 . 

We will treat each of the three terms on the right-hand side separately. First of all, 



A\ = 2 I ji 2 (x\9v)ir 2 (x\9v)w{x)dx 

+ 2 fi 2 (x;9 ) {n 2 (x) - ir 2 (x;9 )} w(x)dx = A 4 + A 5 . 



We will show that A§ is op(l). By the Cauchy-Schwarz inequality combined with 
the C2-inequality we have 

1/2 

|A 5 |<2<j / fi 2 (x;9 )(n(x) - n(x;9 )) 2 w(x)dx\ 



x 2 / fi z (x;9 )(-n-(x) - tt(x;9 )) w(x)dx 

r \ 1/2 

+ 8 ti 2 {x; 9o)ir 2 (x; 9o)w{x)dx > 
Jr J 

The right-hand side is op(l) by Lemma [1] and hence so is A^. Thus 
(35) A x =A 4 + o P (l) = 2 / fi 2 (x-9o)7r 2 (x;9 Q )w(x)dx + o P {l). 
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Now we turn to A 2 . By the same reasoning as used for A\, one can show that 
(36) A 2 = 2 / jj,(x; 6 )fi(x; 9 q )tt 2 (x; 9 )w(x)dx + o P (l). 



Finally, a long and tedious computation, which is omitted to save the space and 
which is similar to the one used to study A\, shows that 



(37) 



A, 



[a 2 (a;)7r(x; 6 )] ' fi{x; 9 )ir(x; 9 )w(x)dx + op(l). 



The statement of the lemma follows upon collecting formulae (f3"5 j) - (|3"T|) and using 
the representation (|34p . □ 

Lemma 7. For T 2 defined as in the proof of Lemma [5| and under the same condi- 
tions as in Lemma\^ we have T 2 — > 0. 

Proof. Denote $„(#) = R n {9). Using the mean- value theorem, we have the following 
chain of inequalities, 



/ (® n 0n + \(0O - On)) - $ n (0 O ))d\ 



= 1 



Since \9 n — 0q\ - 
show that 

(38) 1 G „ 

Observe that 



(1 - \)d\ / $ n (0 o + ^(1 - A)(0„ ~ e Q ))dyj(9 n - O ) 



< If 



dX 



$„(0 O + ^(1 - A)(0n ~ <?o)) #|0„ ~ O |- 



• (1) by Theorem [TJ in order to prove the lemma it suffices to 



$„(#o + V>(1 - A)(0„ - O )) # = Qp(l)- 



= 4 



9)jl(x; 9)n 2 (x)w(x)dx 
+ 2 I 'fi(x;9)fj,(x;9)Tt 2 (x)w(x)dx 
+ 2 f jl(x;9)fi(x;9)n 2 (x)w(x)dx 

[cr 2 (x)7r(a;)] ' ^(x;9)n(x)w(x)dx 



= A^ff) + A 2 (9) + Az(9) + A 4 (9), 

w here differentiation under the integral sign is justified by the corollary on p. 72 
in lWhittaker and Watson (2006). by de la Vallee Poussin's test on p. 72 there and 
by our assumptions. Next insert the expression above into the left-hand side of 
formula Denoting 



0n,V,A = 0O + ^(l-A)(0„-0 O ), 
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we see that we need to show that 



If 



dX 



Ai(9 n ,i(>,\) 



d^j = 0,(1). 



By appropriately selecting e in the definition of the set G„ j£ in (|25[) . one can achieve 
that for all A, ij) G [0, 1] one has that n ,v>,A belongs to the interior of the parameter 
set O. Keeping this in mind, we need to study the term 



(39) 







fdxf 


Ai{8n,Tp,\) 


10 Jo 





dip 



for i = 1. The arguments for other terms with i = 2, 3, 4 are similar and are omitted. 
We have 



1 







fdX f 


Ai(6 n ,^,x) 


10 Jo 





dlf) 

'- s I lh{x)V>z{x){K(x) - 7r(x; 9 )) 2 w(x)dx 

Jj>2{x)Ji3(x)-K (x; 9 )w(x)dx, 



from which and from Lemma Q] it is immediate that (|39p is Op(l) for i = 1. In the 
light of the remarks made above this completes the proof. □ 
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